For this project, we explored two datasets – tweets about Covid-19 vaccine and the reported side effects of the vaccine. Firstly, we looked into what people are discussing when they tweet about Covid-19 vaccine. Then we narrowed it down to the side effect discussions, to see what’s commonly mentioned here and where the tweets were sent. Secondly, we focused on the adverse reactions reported from 2020-12-01 to 2021-3-31. The visualization aims to provide an insight into who are the people reporting side effects and how do they compare to the general population; the most common reported symptoms etc. We also analyzed tweets associated with the covid-19 vaccine to see people’s attitudes on the vaccine.
load("./data/term_stemmed_all.RData")
This wordcloud includes the popular keywords (appeared more than 600 times) used in tweeting about Covid-19 vaccine.
The most commonly mentioned word is of course “vaccine”, followed by “moderna” and “covid”, while “pfizer” and “pfizerbiontech” are much smaller.
We can also see the variants of alias of Covid-19 vaccine (“covaxin”, “covidvaccine”), and keywords used commonly in describing experiences (“dose”,“receive”,“today”) mentioned for many times.
load("./data/coocedge.Rdata")
load("./data/uniquedges.Rdata")
We clean the text data of all the tweets about vaccines, and then we apply vader sentiment analysis, so we classify all tweets into three categories: positive tweets, neutral tweets and negative tweets.
Now we can see people’s attitude towards vaccines.
We can find from the bar chart that most tweets about vaccines are neutral one or positive one. Negative sentiment is not widely available.
Trends over time of numbers of tweets posted of three sentiment types are similar, maybe because there is no special events affects people’s attitude.
We select top 50 common words for each types of tweets.
After seeing the common words of positive, neutral and negative tweets, we find people share their happiness about the arrival of vaccines and give positive feedback after receiving a shot in positive tweets; neutral tweets are just objective statements of vaccines news or information; people worry about the side effect of vaccines and whether vaccines will work in negative tweets.
We can see sentiment attribute of popular tweets(base on favorited times). Most of top 15 popular tweets are neutral one or positive one, which means that people didn’t show a preference for negative tweets.
We can see sentiment attribute of popular tweets(base on retweeted times). Most of top 15 popular tweets still are neutral one or positive one, which means that people didn’t show a preference for retweeting negative tweets and maybe kept positive attitude towards effect of vaccines.
We also check tweets of popular users(base on number of followers they have), because they have great influence among the public. Most of these users mainly post objective statements of vaccines news or information and they show more positive sentiment than negative sentiment.
Side effect is not really a heated topic among the discussions on twitter. Only 1/50 of all the tweets about Covid-19 vaccine mention side effects.
Given the dataset shrinks a lot when we restrict it to side effect discussions, wordcloud covers words appeared more than 20 times. The most commonly mentioned word, in this case, is of course “side effect”, followed by “moderna” and “vaccine”, while “pfizer” and “pfizerbiontech” are much smaller, in line with what we found in all of the Covid-19 vaccine tweets.
We can also see the keywords “covid”,“day”,“shot”,“arm”,“dose”, which indicates many of these tweets may be records of people’s vaccination experience.
Words like “sore”,“fatigue”,“pain” and “headache” indicates the commonly mentioned side effects in tweets.
# the relative volume of manufacturer names in side effect discussions compared with that in all tweets
library(dplyr)
load("./data/term_stemmed_all.RData")
manu <- stemmed_all %>% filter(term %in% c("moderna","pfizer","pfizerbiontech"))
unimanu <- manu[,c(2,4)] %>% unique()
# moderna:11222
# pfizer:7257
load("./data/se_term_stemmed_all.RData")
manuse <- stemmed %>% filter(term %in% c("moderna","pfizer","pfizerbiontech"))
unimanuse <- manuse[,c(2,4)] %>% unique()
# moderna:416
# pfizer:158
# n1 <- c(11222,7257)
# n2 <- c(416,158)
# manufacturer <- c("moderna","pfizer")
# cpm <- data.frame(manufacturer,n1,n2)
# cpm <- cpm %>% mutate(p = n2/n1)
manufacturer <- c("moderna","moderna","moderna","pfizer","pfizer","pfizer")
group <- c("all","side effect","relative","all","side effect","relative")
n <- c(11222,416,0.037,7257,158,0.022)
cpm <- data.frame(manufacturer,group,n)
cpm[c(-3,-6),] %>% ggplot() +
geom_col(aes(manufacturer,n,fill=group),position = position_dodge2(preserve = "single")) +
labs(title = "Occurrences of manufacturer's names ",
x="Manufacturer",
y = "Occurrences") +
theme_minimal() +
scale_fill_manual(values = c("#fdb863","#2d004b"))
Moderna has larger volume in both all tweets and tweets about side effects, compared to Pfizer.
Looking at the proportion of side-effect-relevant tweets in all tweets for each manufacturer, moderna still scores higher than pfizer, which means it may be more likely to trigger side effects. Just a thought to be tested for in later analyses.
packages <- c("devtools","knitr","tidyverse","widgetframe","readr",
"wordcloud", "base64enc", "tidytext",
"RWeka","stats","manifestoR","readtext",
"rvest", "stringr",
"SnowballC", "plotrix", "tidyr", "tidytext", "stats",
"dendextend", "ggthemes",
"httr","jsonlite", "DT", "textdata", "ggmap","maptools","mapproj","rgeos","rgdal",
"RColorBrewer", "stringr","scales", "leaflet", 'leafpop', "ggthemes", "ggtext", "wordcloud")
packages <- lapply(packages, FUN = function(x) {
if(!require(x, character.only = TRUE)) {
install.packages(x)
library(x, character.only = TRUE)
}
}
)
vac <- read_csv("data/hanyudata/2021VAERSDATA.csv")
sym <- read_csv("data/hanyudata/2021VAERSSYMPTOMS.csv")
vax <- read_csv("data/hanyudata/2021VAERSVAX.csv")
#convert vaccination date
vac$VAX_DATE <- as.Date(vac$VAX_DATE, format = "%m/%d/%Y")
#merge the first two csv file by patient ID
merge1 <- left_join(vac, sym, by = "VAERS_ID")
#merge the third csv together
merge <- left_join(merge1, vax, by = "VAERS_ID")
#filter for COVID19 vaccine
covid <- merge %>%
filter(VAX_TYPE == "COVID19")
#find out patient's age and gender
agesex <- covid %>%
distinct(VAERS_ID,.keep_all = TRUE) %>% #distinct patients
select(AGE_YRS, SEX, VAX_DATE) %>%
filter(AGE_YRS >= 18 & AGE_YRS != 'NA', VAX_DATE >= "2020-12-01" & SEX != "U" ) #filter for age over 18, vaccination data after 2020-12-01 and filter out unknown value for sex
as <- agesex %>%
group_by(SEX,AGE_YRS) %>%
count(SEX,AGE_YRS)
ggplot(as, aes(x = AGE_YRS, y = n, color = SEX))+
geom_line(size = 1)+
labs(title = "Covid Vaccine Side Effects Reported Based on Age and Sex<br> By Mar 31, 2021",x="Age",y = "Number of People Reported") +
scale_x_continuous(breaks = seq(20,110,10), limits = c(15, 110)) + scale_y_continuous(breaks = seq(0,600,200), limits = c(0, 600))+
theme_minimal() + theme(plot.title = element_markdown(hjust = 0.5),legend.title = element_blank()) + scale_color_manual(values = c("#fdb863","#2d004b"),labels = c("Female", "Male"))
We could see that in general, women and younger people seem to suffer more from side effects. As age increased, the report number actually decreased, especially for women. Do elders suffer more from side effects? Not exactly. Is it possible there are fewer elders who got vaccinated thus fewer reports? We decided to dive deeper into who got vaccinated by looking at different age groups.
#vaccinated number by age group from cdc
agegroup <- read_csv("data/hanyudata/demographic_trends_of_people_receiving_covid19_vaccinations_in_the_united_states.csv")
agegroup$Date <- as.Date(agegroup$Date, format = "%Y/%m/%d")
ageg <- agegroup %>%
filter(Date <= "2021-03-31") %>% #filter for date before 2021/03/31
filter(`Age Group` != "Age_unknown" & `Age Group` != "Age_known" & `Age Group` != "Ages_<18yrs") %>% #filter for age > 18
select(Date, `Age Group`, `People with at least one dose`, `Percent of age group with at least one dose`, Census)
ageg$Age <- with(ageg, ifelse(`Age Group` == "Ages_18-29_yrs", "18-29",
ifelse(`Age Group` == "Ages_30-39_yrs", "30-39",
ifelse(`Age Group` == "Ages_40-49_yrs", "40-49",
ifelse(`Age Group` == "Ages_50-64_yrs", "50-64",
ifelse(`Age Group` == "Ages_65-74_yrs", "65-74",
ifelse(`Age Group` == "Ages_75+_yrs", "75+", "other")))))))
g <- ggplot() + geom_line(data = ageg, aes(x = Date, y =`Percent of age group with at least one dose`, color = Age))+ scale_color_brewer(palette = "PuOr") +scale_y_continuous(breaks = seq(0,100,25), limits = c(0,100))+labs(title = "Percentage of People that Have Received at Least One Dose of Cov Vaccine<br>by Age Group", subtitle = "2020/12/16 - 2021/3/31",x="",y = "") + theme_minimal() + theme(plot.title = element_markdown(hjust=0.5)) + theme(legend.position = "top", legend.title = element_text(size = 8)) + theme(plot.subtitle=element_text(hjust=0.5)) + scale_y_continuous(labels = c("0" = "0", "25" = "25%", "50" = "50%", "75" = "75%", "100" = "100%")) + guides(colour = guide_legend(nrow = 1))
#, labels = c("18-29", "30-39", "40-49", "50-64", "65-74", "75+"
library(plotly)
ggplotly(g) %>% layout(title = paste0("Vaccinated Rate by Different Age Group", "<br>","2020/12/16- 2021/3/31"))
By mar 31, 2021, more than 75% of people over 65 had received at least one dose of covid-19 vaccine; a much higher percentage than younger group.
# report num
yrs <- as %>%
group_by(AGE_YRS) %>%
summarise(sum = sum(n))
yrs$group <- with(yrs, ifelse(AGE_YRS >= 18 & AGE_YRS <= 29, "Ages_18-29_yrs",
ifelse(AGE_YRS >= 30 & AGE_YRS <= 39, "Ages_30-39_yrs",
ifelse(AGE_YRS >= 40 & AGE_YRS <= 49, "Ages_40-49_yrs",
ifelse(AGE_YRS >= 50 & AGE_YRS <= 64, "Ages_50-64_yrs",
ifelse(AGE_YRS >= 65 & AGE_YRS <= 74, "Ages_65-74_yrs",
ifelse(AGE_YRS >= 75, "Ages_75+_yrs", "other")))))))
# report cases by different age group
reportbygroup <- yrs %>%
group_by(group) %>%
summarise(reportnumber = sum(sum))
#join vaccinated cases to calculate report rate
report <- left_join(reportbygroup,ageg, by = c("group" = "Age Group"))
reportrate <- report %>%
filter(Date == "2021-03-31") %>% # vaccinated number until 2021/03/31
select(group, reportnumber,`People with at least one dose`) %>%
mutate(Rate = round(reportnumber*10000 / `People with at least one dose`,2)) #rate
ggplot()+
geom_col(data = reportrate, aes(x = group, y = Rate, fill = group)) + scale_fill_brewer(palette = "PuOr")+
labs(title = "Number of Reported Side Effects Cases per10k Vaccinated People<br> by Mar 31, 2021",x="Age",y = "Number of Reported Cases") +
scale_x_discrete(labels = c("Ages_18-29_yrs" = "18-29", "Ages_30-39_yrs" = "30-39", "Ages_40-49_yrs" = "40-49", "Ages_50-64_yrs" = "50-64", "Ages_65-74_yrs" = "65-74", "Ages_75+_yrs" = "75+")) +
theme_minimal() +
theme(plot.title = element_markdown(hjust = 0.5),legend.position = "none")
By mar31,2021, for people over 75 years old, fewer than 3 of every 10,000 people who received at least one dose of COVID-19, reported adverse effects. In contrast, those aged 30 to 39 reported the highest rate of adverse effects, at 6 out of every 10, 000 people who received the vaccine. So, counter-intuitively, it seems that elders are less vulnerable to side effects. Some articles suggest that the immune response may actually be stronger in the younger group, so side effects may be more pronounced for the younger. The results of our data seem to confirm this. However, there may be other reasons for the lower rate of reported side effects in the elderly group, such as the elderly group is not as likely to use computers as the younger group, which affects the reporting rate, etc
Medical history and Pre-illness are also strong indicators to predict suitable candidates for covid-19 vaccines. So we decided to look at the most common pre-illness of these people who reported adverse reactions.
his <- vac %>%
filter(AGE_YRS >= 18 & AGE_YRS != 'NA', VAX_DATE >= "2020-12-01" & SEX != "U" ) %>% #filter for age >18 , vaccination data after 2020-12-01 and filter out unknown value for sex
filter(HISTORY != "None" & HISTORY != "NA" & HISTORY !="N/A" & HISTORY != "n/a" & HISTORY !="no" & HISTORY != "none") %>%
select(VAERS_ID,HISTORY) %>%
rename(doc_id = VAERS_ID, text = HISTORY)
#cleaning
new_stops <- c("no", "history","medical", "conditions", "historyconcurrent", "disease","patient", "relevant","chronic","none", stopwords("en"))
his$text <- iconv(his$text, "UTF-8", "UTF-8",sub='')
his$text = removePunctuation(his$text)
his$text=tolower(his$text)
his$text=removeWords(his$text, new_stops)
his$text=removeNumbers(his$text)
require(RWeka)
# Make tokenizer function
tokenizer <- function(x)
NGramTokenizer(x, Weka_control(min = 1, max = 3))
hisd <- data.frame("history" = tokenizer(his$text))
hisd <- hisd %>%
group_by(history) %>%
count(history) %>%
arrange(desc(n))
#write.csv(hisd, "/Users/hannahz/Desktop/data visualization\\history.csv")
#after counting, I manually filtered most common 50 pre-illness. I combind the same illness (hypertension, high blood pressure)
history <- read_csv("data/hanyudata/pre-illness_history_clean.csv")
purple_orange <- brewer.pal(10, "RdYlBu")
set.seed(2103)
wordcloud(history$history, history$n,
max.words = 50, colors = purple_orange)
Hypertension, asthma, and diabetes are the most common pre-illness mentioned by people reporting side effects. One thought is that respondents with those diseases might be more vulnerable to vaccine side effects. However, Is it possible that those symptoms reported are actually caused by pre-illness but rather than vaccines? People might associate health problems that would have happened anyway with the vaccines. We haven’t come up with a more accurate way to describe this relationship but it is worth exploring.
nums <- vac %>%
filter(AGE_YRS >= 18 & AGE_YRS != 'NA', VAX_DATE >= "2020-12-01" & SEX != "U" & NUMDAYS != "NA") %>%
select(SEX,AGE_YRS,NUMDAYS)
nums$group <- with(nums, ifelse(AGE_YRS >= 18 & AGE_YRS <= 29, "18-29",
ifelse(AGE_YRS >= 30 & AGE_YRS <= 39, "30-39",
ifelse(AGE_YRS >= 40 & AGE_YRS <= 49, "40-49",
ifelse(AGE_YRS >= 50 & AGE_YRS <= 64, "50-64",
ifelse(AGE_YRS >= 65 & AGE_YRS <= 74, "65-74",
ifelse(AGE_YRS >= 75, "75+", "other")))))))
numbyages <- nums %>%
group_by(SEX,group) %>%
summarise(mean = mean(NUMDAYS))
ggplot(numbyages, aes(x = group, y = mean, fill = SEX)) + geom_col(position = "dodge", width = 0.6) + labs(title = "Side Effects Average Onset Days After Vaccination", x ="Age", y = "Number of Days after Vaccination") +
theme_minimal() + scale_y_continuous(breaks = seq(1,6,1), limits = c(0,6)) + theme(plot.title = element_text(hjust = 0.5), legend.title = element_blank())+scale_fill_manual(values = c("#fdb863","#2d004b"),labels = c("Female", "Male"))
In general, the younger group seems to be more sensitive about side effects; they reported symptoms under 3 days after vaccination. Elders tend to be slower in feeling the symptoms. Side effects tend to kick in early for males in the younger group. In the elder group, on the contrary, females seem to suffer earlier from the symptoms.
#select symptoms out
sympword <- covid %>%
select(SYMPTOM1, SYMPTOM2, SYMPTOM3, SYMPTOM4, SYMPTOM5)
# list all the symptoms out in a column
symd1 <- data.frame(symptom=unlist(sympword, use.names = FALSE))
#filter out na value
symd1 <- na.omit(symd1)
common10 <- symd1 %>%
group_by(symptom) %>%
count(symptom) %>%
arrange(desc(n)) %>%
ungroup() %>%
mutate(rank = row_number()) %>%
filter(rank <= 10)
ggplot(common10) + geom_col(aes(x = n, y = reorder(symptom, n)), fill ="#fdd49e", width = 0.7) + scale_x_continuous(name = "Number of Symptom Reported", breaks = seq(0,8000,1000)) + labs(title = "Top10 Side Effect Symptoms", y = "")+theme_minimal() + theme_minimal() + theme(plot.title = element_text(hjust = 0.5))
senti <- vac %>%
filter(AGE_YRS >= 18 & AGE_YRS != 'NA', VAX_DATE >= "2020-12-01" & SEX != "U") %>%
select(VAERS_ID,SYMPTOM_TEXT) %>%
rename(doc_id = VAERS_ID, text = SYMPTOM_TEXT)
senti$text <- iconv(senti$text, "UTF-8", "UTF-8",sub='')
#convert to a df corpus
df_source <- DataframeSource(senti)
df_corpus <- VCorpus(df_source)
df_corpus
## <<VCorpus>>
## Metadata: corpus specific: 0, document level (indexed): 0
## Content: documents: 30095
#filter out neutral words
new_stops <- c("can", "make", "patient", "vaccine", "received","included","medical","experienced", "outcome", "feeling","receiving", "time", "female", "information", "immunization","shot", "nurse","doctor", "action" ," including","noted","shoulder","physician","visit","reporter", "ethics", "resident",stopwords("en"))
#clean corpus
clean_corpus <- function(corpus){
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removeWords, c(new_stops))
corpus <- tm_map(corpus, removeNumbers)
corpus <- tm_map(corpus, stripWhitespace)
return(corpus)
}
cleancor <- clean_corpus(df_corpus)
top_dtm <- DocumentTermMatrix(cleancor)
# convert to tidytext
top_td <- tidy(top_dtm)
#using NRC
# I had some difficulty connecting to the tidytext package and downloading the NRC(probabily because I'm using a vpn) , so I downloaded manually
txt <- read.table("data/hanyudata/NRC-Emotion-Lexicon-Wordlevel-v0.92.txt")
txt <- txt %>%
filter(V3 != 0) %>%
select(V1, V2)
nrc <- dplyr::rename(txt, word = V1, sentiment = V2)
emotion <- top_td %>%
inner_join(nrc, by = c(term = "word")) %>%
group_by(sentiment) %>%
count(term) %>%
arrange(desc(n)) %>%
mutate(rank = row_number()) %>%
mutate(number = n/1000) %>%
filter(rank <= 10)
ggplot(data = emotion, aes(reorder(term, number), number, fill=sentiment)) +geom_bar(stat="identity", show.legend = FALSE) +scale_y_continuous(breaks = seq(0,6,2), labels = c("0" = "0", "2" = "2k", "4" = "4k", "6" = "6k"))+facet_wrap(~sentiment, scales="free_y", ncol=5) +labs(y = "Number of Times Expressed", x = NULL, title = "Emotion Words Expressed in Symptoms Description") +coord_flip() + scale_fill_brewer(palette = "Paired") + theme(plot.title = element_text(hjust = 0.5))
Negative, disgust, fear, sadness are the most frequently expressed emotions in the symptoms text, which is reasonable because people were experiencing uncomfortable feelings in their bodies.
# This data set is originated from the allocation data set above, including the vaccine allocation of Pfizer and Moderna in 50 states and the District of Columbia before 2021-03-31.
# The results of the state elections come from the data provided in the Week 5 lecture.
# The data cleaning process is completed by Excel.
allocation <- read.csv('data/hldata/clean/allocation_clean.csv')
allocation$allocation <- allocation$allocation/10000
# bar
p1<-ggplot(allocation, aes(x = state1, y=allocation,
col = manufacturer ,
fill = manufacturer )) +
geom_bar(stat = 'identity')+
labs(x = "State", y = "Allocation (10k)",
title = "Vaccine allocation of Pfizer and Moderna - by state")+
theme_light()+
scale_color_manual(values = c('purple','orange'),
breaks = c('pfizer','moderna'))+
scale_fill_manual(values = c('purple','orange'),
breaks = c('pfizer','moderna'))+
coord_flip()
p1
#map
us_states <- map_data("state")
us_states <- us_states %>%
dplyr::select(-subregion) %>%
dplyr::rename(state = region)
us_states$state <- str_to_title(us_states$state) # Change the first letter of state to Uppercase
us_states <- us_states %>%
filter(state %in% allocation$state1)
alo.g <- left_join(us_states, allocation, by = c('state' = 'state1'))
ggplot(data = alo.g, aes(x = long, y = lat, group = group))+
geom_polygon(aes(fill = allocation), color = "black")+
scale_fill_distiller(type = "seq", palette = "Oranges",
breaks = seq(0, 700, by = 100),
limits = c(0, 700),
direction = "horizontal")+
facet_grid(manufacturer ~ .)+
theme_map()+
theme(legend.position = "top")
The number of vaccine allocations in each state does not have a brand tendency. In every state, the number of vaccine allocations for the two brands is basically the same.
# This data set is originated from the VAERS data set and the daily vaccinations data set above, including the number of people vaccinated (before 2021-03-31) and the number of side effect case reported (from 2020-12-14 to 2021-03-31) in 50 states and the District of Columbia.
# The results of the state elections come from the data provided in the Week 5 lecture.
# The data cleaning process is completed by Excel.
case_vac <- read.csv('data/hldata/clean/case_vac_clean.csv')
# rate represents the proportion of side effects cases per 100,000 people who have been vaccinated
case_vac <- case_vac %>%
mutate(rate = 100000*case/vaccinations)
# bar
p2<-ggplot(case_vac, aes(x = state1, y=rate,
col = party ,
fill = party )) +
geom_bar(stat = 'identity')+
labs(x = "State", y = "Side effect rate (per 100k vaccinated)",
title = "Side effect rate - by state")+
theme_light()+
scale_color_manual(values = c('Blue','Red'),
breaks = c('DEMOCRAT','REPUBLICAN'))+
scale_fill_manual(values = c('Blue','Red'),
breaks = c('DEMOCRAT','REPUBLICAN'))+
coord_flip()
p2
states.sp <- readOGR(dsn = "data/hldata/cb_2018_us_state_5m/cb_2018_us_state_5m.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/hannahz/Desktop/Group_L_VaccineSideeffect/data/hldata/cb_2018_us_state_5m/cb_2018_us_state_5m.shp", layer: "cb_2018_us_state_5m"
## with 56 features
## It has 9 fields
## Integer64 fields read as strings: ALAND AWATER
# shape file source: https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
# leaflet
#merge to the shapefile
rate.sp <- states.sp
rate.sp@data<-rate.sp@data %>%
left_join(case_vac, by = c('NAME' = 'state1'))
sub.d<-subset(rate.sp, party == 'DEMOCRAT' )
sub.r<-subset(rate.sp, party == 'REPUBLICAN' )
#pop up content
rate.popup.content <- paste("State:",rate.sp@data$NAME ,"<br/>",
"Report rate:",round(rate.sp@data$rate,2),"per 100k vaccined" ,"<br/>",
"Party:",rate.sp@data$party ,"<br/>")
rate.popup.content.d <- paste("State:",sub.d@data$NAME ,"<br/>",
"Report rate:",round(sub.d@data$rate,2),"per 100k vaccined" ,"<br/>",
"Party:",sub.d@data$party ,"<br/>")
rate.popup.content.r <- paste("State:",sub.r@data$NAME ,"<br/>",
"Report rate:",round(sub.r@data$rate,2),"per 100k vaccined" ,"<br/>",
"Party:",sub.r@data$party ,"<br/>")
#
leaflet() %>%
addProviderTiles("OpenStreetMap.Mapnik") %>%
setView(lat = 37, lng = -95, zoom = 4) %>%
addPolygons(group="rate",
data =rate.sp,
stroke = TRUE,
smoothFactor = 0.5,
weight=1,
color = '#333333',
opacity=1,
fillColor = ~colorQuantile("Oranges", rate)(rate),
fillOpacity = 1,
popup = rate.popup.content) %>%
addPolygons(group="Democrat",
data=subset(rate.sp, party == 'DEMOCRAT' ),
popup = rate.popup.content.d,
opacity = 1.0, stroke = TRUE,
color = "blue", weight=1) %>%
addPolygons(group="Republican",
data=subset(rate.sp, party == 'REPUBLICAN' ),
popup = rate.popup.content.r,
opacity = 1.0, stroke = TRUE,
color = "red", weight=1) %>%
addLayersControl(overlayGroups = c("rate","Democrat","Republican"),
options = layersControlOptions(collapsed = FALSE))
The reporting rate of vaccine side effects in each state does not seem to be significantly related to the party’s victory in the 2020 election. But New York has the highest reporting rate, more than double that of Montana, the second highest.
# This data set is originated from the VAERS data set above, including the number of side effect case reported (from 2020-12-14 to 2021-03-31) by manufacturer in 50 states and the District of Columbia.
# The results of the state elections come from the data provided in the Week 5 lecture.
# The data cleaning process is completed by Excel.
case_manu <- read.csv('data/hldata/clean/case_manu_clean.csv')
# bar
p3<-ggplot(case_manu, aes(x = state1, y=case_manu,
col = manu ,
fill = manu )) +
geom_bar(stat = "identity", position = position_dodge())+
labs(x = "State", y = "Cases",
title = "Side effect cases reported of Pfizer and Moderna - by state")+
theme_light()+
scale_color_manual(values = c('purple','orange'),
breaks = c('pfizer','moderna'))+
scale_fill_manual(values = c('purple','orange'),
breaks = c('pfizer','moderna'))+
coord_flip()
p3
# leaflet
#merge to the shapefile
cm.sp.p <- states.sp
cm.sp.p@data<-cm.sp.p@data %>%
left_join(subset(case_manu, manu == "pfizer"), by = c('NAME' = 'state1'))
cm.sp.m <- states.sp
cm.sp.m@data<-cm.sp.m@data %>%
left_join(subset(case_manu, manu == "moderna"), by = c('NAME' = 'state1'))
#pop up content
cm.p.popup.content <- paste("State:",cm.sp.p@data$NAME ,"<br/>",
"Report case:",cm.sp.p@data$case_manu ,"<br/>",
"Manufacturer:",cm.sp.p@data$manu ,"<br/>",
"Party:",cm.sp.p@data$party ,"<br/>")
cm.m.popup.content <- paste("State:",cm.sp.m@data$NAME ,"<br/>",
"Report case:",cm.sp.m@data$case_manu ,"<br/>",
"Manufacturer:",cm.sp.m@data$manu ,"<br/>",
"Party:",cm.sp.m@data$party ,"<br/>")
#
leaflet() %>%
addProviderTiles("OpenStreetMap.Mapnik") %>%
setView(lat = 37, lng = -95, zoom = 4) %>%
addPolygons(group="pfizer",
data =cm.sp.p,
stroke = TRUE,
smoothFactor = 0.5,
weight=1,
color = '#333333',
opacity=1,
fillColor = ~colorQuantile("Purples", case_manu)(case_manu),
fillOpacity = 0.5,
popup = cm.p.popup.content) %>%
addPolygons(group="moderna",
data =cm.sp.m,
stroke = TRUE,
smoothFactor = 0.5,
weight=1,
color = '#333333',
opacity=1,
fillColor = ~colorQuantile("Oranges", case_manu)(case_manu),
fillOpacity = 0.5,
popup = cm.m.popup.content) %>%
addLayersControl(overlayGroups = c("pfizer","moderna"),
options = layersControlOptions(collapsed = FALSE))
According to previous visulizations, there is no significant difference in the number of vaccine allocations between Moderna and Pfizer in each state. However, Pfizer’s vaccine has more reported cases of side effects.